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ABSTRACT 


The  problem  of  a  catastrophic  node  failure  In  a  contract 
net  will  be  addressed.  STRIPS  will  be  distributed  by  means 
of  the  contract  net  protocol,  and  used  as  a  medium  for 
studying  the  methods  that  might  be  used  to  recover  from  the 
node  failure. 

Four  methods  will  be  described,  and  compared,  with 
respect  to  normal  and  recovery  operational  costs. 

This  paper  was  completed  under  the  supervision  of 
Randall  Davis,  Assistant  Professor,  Electrical  Engineering 
and  Computer  Science. 
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1.  DISCRIPTIOH  OF  STRIPS 


The  Stanford  Research  Institute  Problem  Solver, 
STRIPS,1  has  been  Included  in  this  paper  as  a  medium  for 
studying  methods  of  recovering  from  catastrophic  failure 
of  nodes  in  a  contract  net* 

STRIPS  uses  a  theorem  resolution  mechanism  to  deter¬ 
mine  differences  between  a  current  state  and  a  goal  state 
and  uses  it  to  verify  correct  means  of  transferring  from 
one  state  to  the  other.  STRIPS  also  has  a  list  of  oper¬ 
ators,  with  their  relevant  preconditions  and  effects, 
which  can  change  the  rtates. 

Here  is  a  brief  discription  of  how  a  problem  is  re¬ 
solved.  A  present  woxid  model  MQ  and  a  goal  state  GQ  is 
made  out  of  well-formulated  formulae  WFF's.  The  theorem 
prover  is  asked  to  show  that  GQ  is  included  in  MQ.  If  it 
la  successful  then  the  problem  is  solved,  if  it  is  not  the 
Incomplete  proof  is  u3ed  as  the  difference  between  MQ  and 
Gq.  STRIPS  then  goes  to  its  collection  of  operators  to 
see  if  any  are  relevant  in  reducing  this  difference.  Of 
these  operators  having  the  desired  effect,  the  operator  0, 
which  has  the  fewest  literals  in  its  preconditions,  is 
chosen  as  the  best  to  try  first.  A  new  world  model  is 
constructed  by  applying  the  operator  to  the  old  world 
model,  M.j  < —  0Mq.  STRIPS  is  called  recursively,  with  the 
problem  now  to  see  if  GQ  is  Included  in  M^.  Termination 
occurs  when  a  world  model  is  found  that  includes  the  goal 
state  and  the  operators  are  returned  in  proper  order. 
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There  are  two  issues  which  should  be  discussed  further, 
b 74  r’OT'*T%°l  ,r*r"bM  .-•***  vhich  cpti  r@ducG  bho 

difference.  The  operators,  themselves,  have  preconditions 
which  may  not  be  immediately  met  in  MQ. 

When  there  is  a  tie  between  several  operators  STRIPS 
arbitrarily  picks  one.  Should  this  choice  prove  to  be 
fruitless,  STRIPS  backtracks  and  tries  the  next  best  choice. 
It  fails  to  solve  a  problem  if  no  operators  can  be  found 
to  reduce  the  differences. 

The  preconditions  of  the  operators  must  be  met  for 
an  operator  to  be  applied.  STRIPS  handles  this  by  setting 
up  the  preconditions  as  a  goal  state  and  trying  to  reduce  the 
difference  between  this  goal  state  and  the  current  world 
model. 

The  following  is  a  clearer  discription  of  the  problem 
solving  process. 


1*  Example  of  Sear  oh  free  j 

■  ■  — i 

i 
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Mq  and  GQ  are  formulated.  The  theorem  prover  is  asked 
to  show  thdo  u-  iui.^uucd  xxi  »nen  it  is  unable  to 

do  so,  the  incomplete  proof  is  used  as  the  difference  DQ 
between  MQ  and  GQ.  STRIPS  then  goes  to  its  pool  of  oper¬ 
ators  to  find  those  which  can  reduce  D  ,  and  finds  that 

o 

0&  and  0^  are  relevant  to  the  problem,  with  0&  being  the 
better  choice. 

The  preconditions  for  0  are  set  up  as  a  subgoal  G  , 
which  is  placed  before  Go.  The  theorem  prover  is  asked 
if  Mq  includes  G&,  In  this  case  it  is,  so  0&  is  applied 
to  forming  M^,  Ga  is  removed  from  the  goal  list  and 
STRIPS  is  called  on  (? : ^ ,  Gq).  After  determining  the  dif¬ 
ference  between  and  GQ,  it  is  found  that  no  operators 
are  relevant,  so  0^  is  then  tried.  Once  again  a  subgoal, 

G^,  is  constructed  from  the  preconditions  of  the  operator 
0^  and  is  placed  before  the  final  goal  state.  STRIPS 

finds  that  G,  is  not  included  in  M  and  that  the  difference 
0  o 

between  them  can  be  reduced  by  operator  0  .  The  subgoal 

c 

G„  is  constructed  from  the  ore conditions  of  (>  and  is 
c  c 

placed  before  G^  in  the  goal  list.  It  is  found  that  GQ 
is  included  in  so  0Q  is  applied  to  MQ  forming  Mg.  Mg 

so  is  formed.  At  the  last  node  the  theorem 
prover  finds  that  GQ  is  included  in  My  therefore  the  pro¬ 
cess  terminates  with  0C  0^  as  the  correct  operator  sequence. 
STRIPS  uses  best-^irst  search  to  determine  which  node  to 
expand. 


includes  G^ 
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2.  DESCRIPTION  OP  CONTRACT  NET 


STRIPS  lends  itself  to  distribution  and  parallelism 
in  several  areas. 

(1)  The  differences  between  states  could  possib¬ 
ly  be  handled  more  efficiently  if  processors 
could  specialize  in  a  class  of  differences. 

(2)  Parallelism  can  be  introduced  to  simultan¬ 
eously  look  into  possibilities  generated  by 
having  several  relevant  operators  available. 

(3)  Once  an  operator  0^  has  been  selected  there 

are  two  subproblems,  (M.^,  (M^, 

which  may  be  handled  concurrently. 

In  order  to  achieve  these  possible  gains  I  will  use  a 

p 

contract  net  . 

A  contract  net  is  a  set  of  nodes  or  processors  that 
are  able  to  communicate  with  each  other.  Communication  is 
achieved  by  exchanging  messages  which  certain  information 
relevent  to  particular  tasks  they  are  showing,  and  it  is 
standardized  by  means  of  a  protocol. 

When  a  node  has  a  task  that  it  wishes  to  have  some 
assistance  in  completing,  it  issues  a  fqsk  toniidunCetHnf* 

A  task  announcement  can  be  directed  to  any  subset  of  the 
nodes  in  the  net. 

Tne  task  announcement  has  a  header,  which  contains  the 
sender,  intended  recipient  of  the  message,  message  type^ 
and  a  contract  number.  It  also  has  slots, which  contain  a 
discription  of  the  task,  requirements  a  node  must  have  to 
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perform  the  task,  information  it  wishes  to  know  about  a 
node,  and  an  ex  J-  X  Ci  1>1  ■  .4  . jiO  #  \J  unication  is  consid¬ 

ered  to  be  a  valuable  resource  so  the  information  in  these 
slots  is  given  by  keywords  and  abbreviations  known  by  the 

nodes. 


To 

Prom 

Type 

Contract 

Task  Abstraction 

Eligibility  Specification 

Bid  Specification 

Expiration  Time 


Figure  2.  Task  Announcement  Format 


A  node^ which  thinks  it  can  perform  the  task  in  the 
“f’ask  abstraction,  and  which  meets  the  eligibility  speci¬ 
fications  answers  the  announcement  with  a  bid. 
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To 

Prom 

Type 

Contract 

Node  Abstraction 


Figure  3.  Bid  Format 


The  node  abstraction  slot  contains  the  information  required 
by  the  announcing  node  in  the  bid  specification  slot.  This 
information  will  be  used  by  the  announcing  node  as  criteria 
for  selecting  which  node  it  wishes  to  share  the  task  with. 
When  the  announcing  node  has  selected  who  it  wants,  it  sends 
all  award  message  to  that  node.  The  award  message  has  a  task 
specification  slot  which  contains  all  relevant  information 
necessary  for  completion  of  the  task. 

This  exchange  of  task  announcement,  bid  and  award 
messages  is  the  protocol  for  establishing  a  contract.  The 
announcing  node  is  the  manager  of  the  task  and  the  bidding 
node  is  the  contractor  of  the  task.  The  contractor  can  then 
play  the  role  of  the  manager  itself,  by  contracting  out  por¬ 
tions  of  its  task. 
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To 

iTom 

Type 

Contract 

Task  Specification 


figure  4.  Award  format 

When  a  task  has  been  completed  by  a  node,  it  sends  its 
results  to  the  manager  in  a  report  message.  The  report  con¬ 
tains  a  result  description  slot  that  specifies  the  results 
of  the  execution. 


To 

Prom 

Type 

Contract 

Result  Description 


figure  5.  Report  Message 


J 


A  report  message  can  either  be  an  interim  report  or 


■-i  i  '•'J  r~  x .  upon  at  whot  st^c  ox  coo— 

pletion  it  is  issued. 

The  manager  also  has  the  option  of  terminating  a  con 
tract  before  it  has  been  completed  by  means  of  a  termina¬ 
tion  message.  When  a  contractor  receives  this  message  it 
stops  working  on  that  contract  and  cancels  all  related 
subcontracts. 


Award 


Figure  6.  Contract  Message  Protocol 


3.  CONTRACT  NET  DISTRIBUTION  OF  STRIPS 

The  nodes  must  have  a  common  theorem  prover  and  a 
list  of  relevant  operators  to  the  task.  The  theorem 
will  be  required  in  the  eligibility  specification  of  a 
task  announcement.  The  operators  themselves  will  be  con* 
tained  in  the  nodes  so  that  when  a  node  receives  an  an¬ 
nouncement  it  can  determine  if  it  has  any  relevant  opera¬ 
tors  for  reducing  the  differences  in  the  task  announcement. 
As  an  abstraction  of  the  task,  the  difference  between 

and  G„  will  be  included  in  the  task  abstraction  slot 
o  o 

of  the  task  announcement.  The  bid  specification  will  re¬ 
quire  that  bids  contain,  in  their  node  abstractions,  when 
the  node  will  be  available,  the  number  of  literals  in  the 
preconditioned  clauses  of  the  relevant  operator,  and  the 
particular  instantiation  of  that  operator  that  will  be 
used.  The  award  message  will  contain  in  its  task  specifi¬ 
cation  slot  the  current  world  model  MQ  and  the  goal  state 
GQ.  When  the  contractor  reports;  the  result  description 
slot  will  contain  the  list  of  operators  that  reduce  the 
difference  in  the  order  of  pplication. 
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To:  * 

from:  .ZZ^  1 

Type :  TASK  ANNOUNCEMENT 
Contract:  A1 

Task  Abstraction: 

TASK  TYPE:  DIFFERENCE  REDUCTION 
ABSTRACTION:  L 

o 

Eligibility  Specification: 

MUST  HAVE:  THEOREM-PRO VER  XX 

Bid  Specification: 

WHEN  AVAILABLE 
NUMBER  OF  LITERALS 
RELEVANT  INSTANTIATION 

Expiration  Time: 

12:04:36  2-6-79 


Vigors  '?•  Task  Announcement  Example 
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To:  NODE  1 

-You :  '  c  r;  •.  ? 

Typo:  BID 
Contract:  A1 

Node  Abstraction: 

12:04:39  2-6-79 
3 

01  (A,  B) 


Elgars  Q.  Bid  Example 


To:  NODE  2 
Prom :  NODE  1 
Type:  AWARD 
Contract :  A1 

Task  Specification: 

("i-  <W 


figure  9.  Award  Example 
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To:  NODE  1 

Type :  PINAL  REPORT 
Contract:  A1 

Result  Description: 

(O^j  ©2*  0^) 


Figure  10.  Report  Example 


Eefore  discussion  on  the  problem  solving  process  can 
continue  some  issues  must  be  solved  or  clarified. 

(1)  What  information  is  needed  to  allow  a  node  to 
make  an  intelligent  bid? 

(2)  What  information  is  needed  for  the  manager  to 
intelligently  distribute  the  task? 

(3)  When  should  a  manager  terminate  a  contract? 

(4)  Is  it  appropriate  to  distribute  a  task  into 
two  subtasks>  where  the  first  examines  the  dif¬ 
ferences  between  a  joal  state  and  an  Intermedi¬ 
ate  goal;  and  the  second  examines  the  differences 
between  a  world  models  formed  by  the  operator 
related  to  that  intermediate  goal  state^  and  the 
next  goal  state? 


The  task  abstraction  contains  the  Information  perti- 
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nent  to  a  bidder  selecting  a  task.  The  task  name  gives 


th~  clar  3  of  protlf-.:  tv  -  1  h: 

is  expected  to  work  on.  The  difference  DQ  is  the  mini¬ 
mum  information  needed  by  the  bidder  to  determine  if  his 
operators  are  relevent  to  the  task.  With  this  informa¬ 
tion  a  bidder  can  decide  if  it  wants  to  do  the  task  and 
determine  what  instantiation  of  its  operators  is  appro¬ 
priate. 

The  manager  will  use  the  information  requested  in  the 
bid  specification  to  evaluate  the  nodes  usefulness.  There 
are  two  Issues  here.  What  constitutes  a  more  useful  node? 
When  should  contracts  be  established  with  less  useful 
nodes? 

In  STRIPS,  the  he  ristic  for  determining  the  best  node 
to  expand  was  the  one  whose  operator  preconditions  had  the 
fewest  literals.  This  criterion  will  be  used  again  here. 
STRIPS,  however,  did  not  need  to  know  the  name  of  the  opera¬ 
tor  since  it  only  had  one  list  of  operators  to  choose 
from.  The  contract  net  may  have  many  nodes  with  the  same 
capabilities  that  will  give  bids  having  the  same  number  of 
literals.  It  would  be  inappropriate  to  contract  all  of 
them;  since  they  would  all  be  doing  the  same  thing.  The 
particular  operator  instantiation  is  therefore  required 
to  allow  the  manager  to  refrain  from  this  excessive  re¬ 
dundancy.  The  when  available  information  allows  a  manager 
to  break  a  tie  between  nodes  having  the  same  least  number 
of  literals)  and  the  same  particular  operator  instantiation. 

If  two  bids  are  still  tied  after  all  this,  the  manager 


arbitrarily  picks  one. 

.  pi  by  3  :Ri:‘3  are  nonde  t cr.ui :*i i t i a 

in  that  a  best  choice  at  a  particular  level  of  search  may 
not  provide  the  answer.  Therefore,  it  is  necessary  to 
keep  track  of  all  possible  courses  of  action  and  it  mav  be 
useful  to  expand  and  thus  contract  with  less  useful  nodes. 

A  problem  arises  here.  Some  tasks  may  have  an  infinite 
number  of  solutions.  If  a  manager  waits  for  all  of  these 
to  come  back  it  will  be  waiting  a  very  long  time. 

In  this  paper,  it  is  assumed  that  there  are  ample 
nodes  available,  so  a  manager  will  be  allowed  to  contract 
with  the  most  useful  nodes  and  all  the  alternatives.  How¬ 
ever,  when  the  manager  receives  a  solution  to  the  task 
from  one  of  its  contr  ctors,  it  will  terminate  the  remain¬ 
ing,  outstanding  contracts.  This  is  an  adequate  approach 
because  it  will  be  assumed,  for  this  paper,  that  a  node  re¬ 
porting  before  another  has  the  better  solution. 

This  distributed  STRIPS  establishes  an  and/or  goal 
tree.  In  addition  to  terminating  as  discussed  above,  a 
manager  will  want  to  terminate  contracts  to  members  of  an 
and  branch  when  one  of  the  nodes  reports  that  it  has  failed 
to  accomplish  its  contract. 

The  world  models,  goal  states  and  operators  that  will 
be  used  as  examples  later,  are  simple  enough  so  that 
an  operator  can  be  applied  to  a  world  model) without  its 
preconditions  being  met.  This  facilitates  the  added  par¬ 
allelism  of  breaking  a  task  into  concurrent  eubtasks.  This 
is  not  necessarily  true  for  all  problems  that  can  be 
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handled  by  STRIPS. 

A  ^  -»  r-  O  1  «■  -  *  wn  +  o  ^  Vj  r>  ■*■* 

of  the  process,  follows. 


\ 


oration 
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<V  <°a’  ®o» 


("o’  (Gc,  O0)) 


<"o»  (Gf.  G„)) 


<"o»  °a>  <"a«  °o>  <"o’  M  <"c-  ». >  <Mo>  sf>  <"f  80) 


(Hc.  (04,  G0))  (Mf,  (0  ,  G0)) 


("o’  V)  <V  °c>)  <"c>  °d>)  I<"d’  80)J  U"f.  Gg)l  l(Hg,  G„) 


<V  (Gh,  G0» 


and  node) 


<V  M  (Mh’  V 


or  node 


(Mh,  (Gif  G0)) 


<V  Gi>  K’  Go> 


figure  U.  Distributed  STRIPS  and/or  Goal  Tree 
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The  problem  is  reducing  the  difference  between  MQ 

and  uQ.  opex'ciiui'o  wa,  Cc>  and  0^  are  equally  rel¬ 

evant  at  the  first  level,  C>  is  a  dead  end  and  will  not 

a 

result  in  a  solution,  0C  is  included  in  the  solution  0-^, 
0  ,  0^.  0^  is  the  first  operation  in  the  solution  0^, 


N1  receives  the  original  task  to  reduce  the  difference 

between  Mq  and  Gq.  It  computes  the  difference  Dq  between 

them  and  announces  the  task  to  reduce  D  .  Nodes  N„,  N,, 

o  z*  5 

and  examine  D0  and  see  that  they  have  operators  0a, 

0C,  and  Oj,  that  are  relevant,  contracts  with  them  to 

reduce  a  difference  using  their  respective  operators.  Ng 

establishes  goal  G  from  the  preconditions  from  0.  It 

comnutes  the  difference  between  and  G„  and  announces 

o  a 

the  task  of  reducing  it,  N2  then  constructs  M&  by  apply¬ 
ing  0  to  M  and  announces  the  task  of  reducing  the  differ- 
ence  between  them.  In  this  example  no  nodes  can  reduce 
either  of  the  differences  so  Ng  reports  to  that  it  has 
failed.  If  N1  had  any  other  contracts  which  depended  upon 
Ng's  results  it  would  now  terminate  them.  establishes 

G  from  the  preconditions  of  0„.  It  computes  the  differ- 
c  c 

ence  between  and  G^  and  announces  the  task  of  reducing 
o  c 

that  difference.  sees  that  it  has  operator  0^  relevant 

to  the  task.  After  contracts  with  to  reduce  that 
difference,  it  establishes  G^  from  the  preconditions  of  0^. 

determines  that  there  is  no  difference  between  MQ  and 
G0.  It  then  computes  by  applying  0^  to  Mc  and  finds 
there  is  no  difference  between  and  Gc» 


Its  subtasks 


completed,  reports  to  giving  as  Its  result  0^. 
while  had  been  .v^x* _ Jt  ilg  been  working  on  a  sim¬ 

ilar  problem,  announced  by  after  it  had  computed  Mc 
by  applying  Oc  to  MQ.  Ng, working  in  the  same  manner  as 
has  determined  that  0^  is  a  good  operator  because  there 

is  no  difference  between  M  .  GAt  and  between  M.,  G-  so  it 

c  d’  d*  o 

reports  to  operator  °d*  N3  receiving  operators  0^  from 
and  0^  from  Ng  combines  them  with  0C  and  reports  to 
N1  that  (0^,  0C,  0d)  satisfies  its  contract.  While 
had  been  laboring  on  its  own  contract.  The  process  below 
operates  in  the  same  manner  as  the  previously  discussed 
nodes,  with  all  of  the  leaf  nodes,  in  its  branch  of  the 
tree,  having  no  difference  between  their  respective  current 
world  model  and  goal  state.  It  reports  to  N1  that  (0^, 

Og,  0^,  0^)  satisfies  its  contract.  It  is  important  to 
know  what  happens  if  or  ^4  reports  before  the  other. 

If  reports  first,  notifies  that  it  wishes  to  term¬ 
inate  its  contract.  N^,  in  turn,  notifies  that  it 
wishes  to  terminate  its  contract,  and  so  on  down  the  line. 

It  has  been  assumed  that  all  processors  work  at  the  same 
rate,  so  the  best  solution  has  been  picked  in  this  case. 
Clearly,  if  processors  don't  work  at  the  same  rate,  some 
work  will  have  to  be  done  to  determine  the  optimum  solution. 

The  search  pattern  has  now  been  changed  from  the  best- 
first  search  employed  by  STRIPS  to  a  breadth-first  search. 
The  optimum  solution  in  this  case  has  been  found  in  the  time 
it  would  take  STRIPS  to  find  it,  if  STRIPS  made  all  the 
right  eholcts. 


4.  DETAILED  EXAMPLE  OP  DISTRIBUTED  STRIPS 


% 

The  example  used  by  Pikes  and  Nilsson  to  illustrate 
the  operation  of  STRIPS  will  be  used  to  give  a  more  detail¬ 
ed  illustration  of  distributed  STRIPS. 

The  problem  is  to  determine  a  plan  which  a  robot,  in 
a  room  with  three  boxes,  could  use  to  arrange  the  boxes 
so  that  they  would  be  at  the  same  location. 


Pigore  12.  Picture  of  Example  Problem 


''’he  clauses  used  to  describe  the  world  model  simply 
give  the  positions  of  the  robot  and  the  three  boxes. 

(1)  ATR(x):  The  robot  is  at  location  x. 

(2)  AT(y,z):  The  object  y  is  at  location  z. 

There  are  two  operators  which  can  change  the  world 

model.  An  operator  makes  the  change  by  deleting  the  clauses 
from  the  world  model  that  are  contained  in  the  operators 
delete  list,  and  by  adding  clauses  that  are  in  its  add  list. 


(l)  rush  (k,  m,  n)  :  Robot  rushes  object  k  from  lo- 


Precondition :  AT(k,  m)/VATR(m) 

Delete  List:  An'R(p)  AT(k,  p) 

Add  List:  AT(k,  n)  ATR(n) 

(2)  goto  (m,  n) :  Robot  goes  from  location  m  to  lo¬ 
cation  n. 

Precondition:  ATR(ra) 

Delete  List:  A"R(r) 

Add  List:  An’R(n) 

The  delete  lists  have  been  modified  with  respect  to 
the  original  examnle.  The  variable  p  is  just  used  as  a 
place  holder.  When  AT(r,  p)  or  ATR(p)  is  deleted  from  the 
world  model,  the  location  of  the  robot  or  object  is  irreve- 
lant  to  the  deletion.  Applying  goto  (j,  b)  to  a  world 
model  will  result  in  the  robot  being  placed  at  b  irregard- 
less  of  the  value  of  j. 

The  purpose  of  this  modification  is  to  facili+ate  curren 
reduction  of  the  difference  between  the  current  world  model 
and  operators  preconditions,  and  the  difference  between  the 
new  world  model,  formed  by  applying  the  operator  and  the 
goal  state. 

The  initial  world  model  of  this  task  is  given  by 

Mo:  ATR(a) 

AT(Box  1,  b) 

AT(Box  2,  c) 

A^(B ox  3,  d) 


The  goal  of  the  task  is  described  by 
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Go:  (3x)  (AT  (Box  1 ,  x)  \ 

A*  <_  ,  Xj  /' 

AT  ( Box  3,  x) . 

This  problem  is  interesting  as  an  example  because  there 
are  an  infinite  number  of  possible  solutions  and  because 
there  are  many  solutions  with  the  same,  even  the  optimum, 
level  of  complexity. 

4 

STRIPS  uses  the  QA  3.5  theorem  prover  to  determine 

the  differences  between  states.  Only  those  differences, 

not  the  computation  of  them,  will  be  repeated  here. 

The  first  node,  Node  1,  which  assumes  the  role  of  the 

overall  task  manager,  determines  that  the  difference  between 

If  and  is 
o  o 

y—  AT  ( Box  1 ,  c)v  ~  AT  ( Box  3 ,  C ) 

AT  (Box  2,  b)\/—  AT  (Box  3,  b) 

—  AT  (Box  1,  d)  V  ~ >  AT  (Eox  2,  d) 

Node  1  needs  to  have  this  difference  reduced  so  it  issues 
a  task  announcement. 

To  :  * 

Prom:  NODE  1 

Type:  TASK  ANNOUNCEMENT 

Contract:  1 

Task  Abstraction: 

TASK  TYPE:  DIFFERENCE  REDUCTION 
ABSTRACTION:  ^  AT  (Box  1,  c)^AT  (Box  3,  c) 
z-'  AT  (Box  2,  bjtt-'  AT  (Box  3,  b) 


r*  AT  (Box  1,  d)Vy  AT  (Box  2,  d) 


Eligibility  Specification: 

MUST  HAVE:  THEOREM  PROVER  QA3.5 

Bid  Specification: 

WHEN  AVAILABLE 
NUMBER  OP  LITERALS 
RELEVANT  INSTANTIATION 


Expiration  Time: 

12:04:36  2-6-79 

The  only  relevam,  operator  is  push  (k,  m,  n)  but  there 
are  six  relevant  instantiations  push  (Eox  1,  m,  c)t  push 
(Box  1,  mf  d),  push  (^ox  2,  m,  b) ,  push  (Box  2,  m,  d), 
push  (Fox  3,  b)  and  push  (Fox  3,  c). 

An  operator  having  several  relevant  instantiations  is 
a  difficult  issue  to  resolve.  It  will  be  assumed  that 
enough  nodes  will  bid  to  allow  Node  1  to  contract  each  in¬ 
stantiation  with  a  different  node.  If  not,  Node  1  will  be 
required  to  remember  uncontracted  instantiations,  and  the 
node  that  offered  them,  should  the  selected  instantiations 
fail. 

The  operation  following  each  node  below  Node  1  will 
be  identical,  except  for  different  variables  used  as  para¬ 
meters  to  the  different  instantiations.  Expanding  one  will 
be  representative  of  them  all. 


To:  NODE  1 


xX'oni:  .  \j u u  2 

Type:  BID 
Contract:  1 

Node  Abstraction: 

12:05:06  2-6-79 

(2,  2,  2,  2,  2,  2) 

(push  (Box  1,  m,  c)(  push  (Box  1,  m,  d), 

push  (Box  3,  m,  b),  push  (Box  2,  m,  d), 

push  (Box  3,  m,  b),  push  (Box  3»  m,  c)) 

Node  2  now  seals  the  contract  with  the  award  message. 

To:  NODE  2 
From:  NODE  1 
Type:  AWARD 
Contract:  1-3 

Task  Specification: 

push  (Box  2,  m,  b) 

((ATR  (a)  AT  (Box  1,  b) 

AT  (Box  2,  c)  AT  (Box  3,  d)), 

(Ox)  (AT  (Box  1,  x)  A 

AT  (Box  2,  x)  A 

AT  (Box  3,  x))) 

The  instantiation  push  (lox  2,  m,  b)  has  been  inclu¬ 
ded  in  the  task  specification  so  that  Node  2  will  know  which 


instantiation  offered  it  is  to  use.  '’'he  -3  has  been  append¬ 
ed  to  the  contract  nuL^cr  ao  .uav  .ode  1  can  keep  track  of 
which  instantiation  it  gave  to  Node  2. 

Expansion  of  goal  tree  so  far. 


When  Node  2  receives  the  award  it  uses  AT  (Box  2,  c) 

from  Mq  to  refine  the  operator  instantiation  to  pash  (Box 

2,  c,  b).  It  breaks  the  task  into  two  subtasks.  The  first 

is  to  see  if  the  operator  preconditions  are  met  in  MQ.  The 

second  is  to  see  if  G  is  included  in  the  world  model  formed 

o 

by  applying  push  (Box  2f  c,b)  to  MQ, 

Node  2  determines  that  the  difference  between  and  the 

o 

precondition  ATR(c).  It  issues  a  task  announcement. 

To:  * 

Prom:  N0DE2 

Type:  TASK  ANNOUNCEMENT 
Contract:  2 

Task  Abstraction: 

TASK  TYPE:  DIFFERENCE  REDUCTION 
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ARS TRACTION:  ^  A^R(c) 

Eligibility  Specification: 

MOST  HAVE:  THEOREM  PROVER  QA3.5 

Bid  Specification: 

WHEN  AVAILABLE 
NUMBER  OF  LITERALS 
RELEVANT  INSTANTT ATION 

Expiration  Time: 

12:05:36  2-6-79 

The  only  instantiation,  goto(m,  c),  leads  to  an  optimum 
solution. 

To:  NODE  2 
From :  NODE  3 
Type:  BID 
Contract :  2 

Node  Abstraction: 

12:06:06  2-6-79 

2 

goto(m,  c) 


Node  2  answers  with 
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To:  NODE  3 

^ 

Type:  AWARD 
Contract:  2-1 

Task  Specification: 
goto(a,  c) 

((ATR  (a)  AT  (Bex  1,  b) 

AT  (Box  2,  c)  AT  (Box  3,  d)), 

(ATR  (a))) 

Before  announcing  the  second  subtask.  Node  2  forms 
another  world  model  b;  applying  push  (Box  2,  c,  b)  to  it. 
:  ATR  (b) 

AT  (Box  1,1) 

AT  (Box  2,  b) 

AT  (Box  3,  d) 

It  finds  the  difference  between  M.  and  are 
/-'AT  (Box  1,  d)V/-'AT  (Box  2,  d) 

^  AT  (Box  3,  b). 

The  only  rele  ant  operator  is  push  (k,  m,  n)  with  three 
instantiations,  push  (Box  1,  a,  d)  push  (Box  2,  m,  d)  and 
pash  (Box  3,  m,  b).  The  first  two  will  lead  to  non-opti- 
aum  solutions.  Node  2  issues  the  task  announcement. 

To:  * 

7r oa :  NODE  2 

Type:  TASK  ANNOUNCEMENT 

Contract:  3 
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T=?sk  A  vs  tract  ion: 

TASK  TYPE:  DIFFERENCE  REDUCTION 
ABSTRACTION:  AT  (Box  3,  b) 

**  AT  (Box  1,  d )Vk  AT  (Box  2,  d) 

Eligibility  Specification: 

MUST  HAVE:  THEOREM  PROVER  QA3.5 

Bid  Specification: 

WHEN  AVAILABLE 
NUMBER  OF  LITERALS 
RELEVANT  INSTANTIATION 

Expiration  Time: 

12:06:36  2-6-79 

Node  4,  among  others,  makes  a  bid. 

To:  NODE  2 
From :  NODE  4 
Type:  BID 
Contract :  3 

Bode  Abstraction: 

12:07:06  2-6-79 

(2,  2,  2) 

(push  (Box  1,  ra,  d),  push  (Bid  2,  m,  d),  push  (Box  3, 
®»  d)) 
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Node  2  selects  Node  4  to  work  on  push  (Box  3,  m,  b) . 

To:  NODE  4 
From :  NODE  2 
Type :  AWARD 
Contract:  3-3 

Task  Specification: 

push  (Box  3t  m,  t) 

((ATR  (b)  AT  (Box  1,  b) 

AT  (Box  2,  b)  AT  (Box  3,  d)), 

((3x)  (AT  (Box  1,  x)A  AT  (Box  2,  x)A, 

AT  (Box  3,  x))) 
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To:  * 

«  X  U-J«  •  sj  XJJ~>  *f 

Type:  TASK  ANNOUNCEMENT 
Contract:  4 


Task  Abstraction: 

TASK  TYPE:  DIFFERENCE  REDUCTION 
ABSTRACTION:  -  ATR  (d) 


Bid  Eligibility: 

MUST  HAVE:  THEOREM  PROVER  QA3.5 


Bid  Specification: 

WHEN  AVAILAELE 
NUMBER  OF  LITERALS 
RELEVANT  INSTANTIATION 


Expiration  Time: 

12:07:36  2-6-79 

It  forms  the  new  world  model  by  applying  push  (Eox  3,  dt  b) 
to  it. 

M?:  A^R  (b)  AT  (Box  1,  b) 

AT  (Box  2,  b)  AT  (Box  3,  b) 

Node  4  finds  no  difference  between  and  G  .  so  it  knows 

i  o 

pash  (Eox  3»  d,  b)  is  ralid,  provided  the  operators’  pre¬ 
conditions  are  met. 


There  is  one  operator  instantiation  which  will  reduce 
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the  difference  of  the  node  abstraction  goto  (m, 

5  has  goto  (m,  n)  and.  on  ine  task. 

To:  NODE  4 
From :  NODE  5 
Type :  BID 
Contract:  4 

Node  Abstraction: 

12:08:06  2-6-79 

1 

goto  (m,  d) 

Node  4  awards  the  contract  to  look  into  goto  (m. 
Node  5. 

To:  NODE  4 
From :  NODE  5 
Type:  AWARD 
Contract:  4-1 

Task  Specification: 
goto  (m,  d) 

((  ATR  (b)  AT  (Box  1,  b) 


1 


d).  Node 


d)  to 


AT  (Box  2,  b)  AT  (Box  3f  d))f 
(ATR  (a))) 


Node  3  uses  ATR  (a)  to  refine  goto  (m,  b)  to  goto 

-j  i  •  i  «*  a-Lnu-j  'xiicu,  uic  p^.  cCOXICiitiOllS  ior  C-~  *  o, 

are  met  in  GQ.  It  also  finds  the  world  model,  formed  by 
applying  goto  (a,  b)  to  MQ  includes  the  goal  state  it 
received.  Node  3  makes  its  report  to  Node  2. 

To:  NODE  2 
Prom :  NODE  3 
Type:  PINAL  REPORT 
Contract:  2-1 

Result  Description: 
goto  (a,  b) 

Node  2  waits  for  results  from  Node  4. 

Node  4  uses  AT  (Box  3,  d)  to  refine  push  (Box  3,  m,  b) 
to  push  (Box  3,  d,  b) .  It  then  determines  that  the  differ¬ 
ences  between  and  the  preconditions  ATR  (d)  AT  (Box  3, 

d)  is  -  ATR  (d).  To  reduce  this,  Node  4  makes  a  task  an¬ 
nouncement. 


-33- 


Fur  ther  extension  of  °t>s1  tree. 


1 


Goal  met  by 
new  model 


Node  5  refines  goto  (m,  d)  to  goto  (b,  d)  to  goto  (b,  d) 
by  using  ATR  (b)  from  the  old  world  model.  It  finds  there  is 
no  difference  between  and  the  preconditions  of  goto  (b,  d). 
After  applying  goto  (b,  d)  to  to  find  a  new  world  model, 
Node  5  finds  there  is  no  between  this  model  and  the  goal  state 
it  received.  It  can  now  report  to  Node  4-  that  its  result  is 
goto  (b,  d) 


To:  NODE  4 
From:  NODE  5 
Type:  FINAL  RESULT 
Contract:  4-1 
Result  Description: 
goto  (b,  d) 


When  Node  4  receives  the  report  from  Node  5  it  pre¬ 
pares  its  report  '  ,2.  .ining  goto  (b,  d)  and 

push  (Box  3,  d,  b)  yeilds  the  result  goto  (b,  d),  push 
(Box  3,  d,  b). 

To:  NODE  2 
Prom:  NODE  4 
Type:  PINAL  REPORT 
Result  description: 

goto  (b,  d),  push  (Box  3,  d,  b) 

When  Node  2  receives  the  report  from  Node  4,  it  com¬ 
bines  the  result  from  Node  2  its  operator,  push  (Eox  2,  c, 
b)t  and  the  results  from  Node  4. 

To:  NODE  1 
Prom:  NODE  2 
Type :  FINAL  REPORT 
Contract:  1-3 

Result  Description: 

goto  (a,  c),  push  (Box  2,  c,  b) ,  goto  (b,  d),  push 
(Box  3,  d,  b) 

Node  2  then  terminates  the  contracts  it  has  with  the  nodes 
working  on  push  (Box  1,  m,  d)  and  push  (Box  2,  m,  d). 

Node  1  cancels  its  outstanding  contracts,  upon  receipt 


of  Node  2’s  report,  by  termination  messages 


-35- 


5.  DESCRIPTION  OP  PROBLEM  OP  FAILED  NODE 

It  is  apparent  now,  that  problem  solving  by  a  con¬ 
tract  net  distributed  STRIPS  can  be  represented  by  an 
level  2-K  and/or  goal  tree.  Generalizing  it  for  any  hier¬ 
archical  contract  net  yields  an  level  R-tary  goal  tree. 
Studying  the  nodal  relationships  on  such  a  tree,  will  allow 
an  understanding  of  the  effects  of  a  catastrophic  node 
failure. 

In  the  unmodified  contract  net  a  task  employs  Kn  nodes 

and  has  a  cost  C  (x)  associated  with  the  cost  of  computing, 

in  achieving  that  task.  The  total  cost  for  achieving  a 

task  will  be  the  sum  of  the  cost  of  computation  and  the  cost 

associated  with  the  messages.  Letting  TA  be  the  cost  of  a 

task  announcement,  B  be  the  cost  of  the  bid,  A  be  the  cost 

of  an  award  and  R  be  the  cost  of  a  report,  the  total  cost 

P 

is  KnTA+KnBtKn A+KnR  +G ( x )  or  Kn(TA+B+A+R  )+C(x) . 

r  r 


ri 


Figure  13.  K-tary  Goal  Tree 


A  nodej  f silling  at  level  L>will  result  in  the  contracts 

*  ,  .  -  .  .  . 

-i. 03?  nc *  _*  .  v  . j. '  .  .1. -.13  Ct.usiH..;  .  iic  oy o  i  ^ii 

to  fail.  It  is  very  important  to  determine  methods  for 
recovery  from  such  a  catastrophe.  This  problem  is  further 
compounded  if  nodes  are  allowed  to  contract  seversQ.  tasks 
at  a  time.  A  single  node  failure  could  potentially  cause 
widespread  damage  among  unrelated  tasks. 
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6.  METHODS  CONSIDERED  FOR  RECOVERY 

Four  methods  will  he  considered  here  for  recovery 
from  nodal  failure. 

( 1 )  order  redundancy 

(2)  Periodic  status  inquiries ,  coupled  with  re¬ 
announcing  the  tasks  of  the  failed  nodes. 

(3)  Periodic  status  inquiries^  coupled  with  in¬ 
terim  status  reports  to  facilitate  recovery 
of  subcontracts. 

(4)  Continuous  communication  between  manager  and 
contractor. 

The  first  method  that  comes  to  mind,  that  requires 
no  additional  message  types,  is  to  utilize  R  order  re¬ 
dundancy.  For  every  task^the  manager  would  simply  announce, 
accept  bids  for;  and  award  R  contracts.  The  number  of 
messages  in  this  aooroach  then  goes  to  (RK)n  and  the  cost 
is  RnKn(TA+B+A+R^)+RnC(x) ,  for  those  messages  plus  the 
cost  of  comoutation.  If  a  node  fails  there  are  R-1  nodes 
still  available  to  give  the  needed  results.  This  is  still 
an  imperfect  fix.  In  the  evenc  of  a  catastrophic  failure, 
where  all  R  nodes  fail,  there  is  no  recourse  for  recovery 
available . 

Since  communication  between  nodes  is  only  made  at 
contract  time  or  when  results  are  reported,  it  is  impos¬ 
sible  for  a  manager  to  detect  the  failure  of  one  of  its 
contractors  due  to  a  catastrophe.  This  necessitates  the 
introduction  of  status  request  messages.  When  a  manager 


has  waited  a  predetermined  time,  without  receiving  a  re- 
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to  its  contractor.  If  the  contractor  is  still  function¬ 
ing  it  will  reply.  When  the  manager  receives  the  reply, 
it  knows  all  is  well  and  so  continues  to  wait  for  a 
report.  If  it  does  not  receive  a  reply,  it  reannounces 
the  task  that  it  had  contracted  to  the  failed  node,  and 

continues  the  achievement  of  its  own  tasks,  through  a  new 

* 

subcontractor.  The  advantages  of  this  approach  are  that 
it  is  simple  and  the  status  messages  can  easily  be  formed 
from  existing  message  formats.  The  disadvantage  of  this 
approach  is  that  all  com out at ion  below  the  failed  node  is 
still  lost  to  the  manager,  and  that  the  nodes  below  that 
will  continue  work  fu.ilely. 

A  status  request  can  be  constructed  from  a  task  an¬ 
nouncement  by  placing  in  the  task  abstraction  slot  the 
task  name,  status  request,  and  by  placing  in  the  bid 
specification  slot  the  required  status.  The  expiration 
time  slot  will  be  coded  for  immediate  response,  indica¬ 
ting  to  the  contractor,  that  when  he  receives  this  message, 
the  manager  expects  him  to  reply  before  completing  any  more 


work 
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To:  CONTRACTING  NODE 

Trom:  .  ..-EL 

Tyne:  TA 

Contract:  CURRENT  CONTRACT  NUMBER 

Task  Abstraction: 

TASK  TYPE:  STATUS  REQUEST 

Bid  Specification: 

STATUS 

Expiration  Time: 

IMMEDIATE  RESPONSE 


Figure  14.  Status  Request 


The  contractor  will  reply  with  a  bid,  having  in  its 
node  abstraction,  his  current  status. 


To:  MANAGING  NODE 

From:  CO..  ■  u„C:  7  ' .  ::0- 

Type:  BID 

Contract:  CURRENT  CONTRACT  NUMBER 

Node  Abstraction: 

STATUS:  BUSY 


Figure  15.  Status  Reply 

Normal  protocol  requires  an  award  message,  so  the 
award  message  will  be  modified  to  be  a  status  acknowledge¬ 
ment. 

To:  CU 
From:  MN 
Type:  AWARD 

Contract:  CURRENT  CONTRACT  NUMBER 

Task  Specification: 

STATUS  ACKNOWLEDGEMENT 


Figure  16.  Acknowledgement 


It  would  not  be  possible  to  insure  that  the  extra 


.w  ».*_  u.1-  '...ien  a  node  hnj  Xr 

but  heuristics  could  be  devised  and  tuned  to  keen  the 
occurence  of  them  to  some  tolerable  level.  A  tolerable 
level  of  ten  percent  is  assumed  in  subsequent  analysis. 

The  cost  of  this  approach  is  Kn(TA+B+A+R^  for  nor¬ 
mal  operation  plus  . 1Kn(TA+B+A)  for  the  exchange  of  mes- 
sages  due  to  a  status  request  plus  SK  ^(TA+B+A+R^)  for 
the  recontracting  of  all  tasks  below  the  failed  node 
plus  C(x),  the  cost  of  computation,  plus  K“^C(x)  for 
the  additional  computation  of  redoing  tasks.  S  is  the 
number  of  failed  nodes  and  I  is  the  level  at  which  the 
node  fails. 


If  n  is  large  anu  L  is  small,  the  cost  of  redoing 
the  tasks  is  significant.  This  encourages  finding  a  method 
that  allows  for  their  recovery. 


The  next  approach  is  to  extend  the  previous  one,  so 
thatj  instead  of  simple  status  reports  containing  only  the 
health  of  the  contractor,  interim  status  reports  are  re¬ 
turned^  which  contain  outstanding  subcontracts,  known  data, 
and  the  means  of  utilizing  them.  The  manager,  when  he 
detects  a  failed  node,  would  use  these  interim  status  re¬ 
ports  to  recontract  the  subcontractors  and  thus  recover 
work  already  in  progress. 

The  status  request  message  would  be  the  same  as  the 
previous  example.  What  will  change  is  the  meaning  of  the 
status  of  a  node. 


In  the  bid  format  the  status  reply  would  now  have  in 
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the  node  abstraction  a  status  report  containing  the  per- 
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To:  MANAGING  NODE 
Prom:  CONTRACT  NODE 
Type:  BID 

Contract:  CURRENT  CONTRACT  NUMBER 

Node  Abstraction: 

STATUS:  PERTINENT  CONTRACTS 
A:  DATA 

B:  (SU: CONTRACTING  NODE,  CONTRACT  NUMBER) 

C : 

UTILIZATION 

TASK-TYPE-NAME  l(A,  TASK-TYPE-NAME  2(B,  C)) 


Plgure  17#  Status  Reoort 

The  award  message  would  be  acknowledgement  that  the 
status  report  had  been  received.  The  complexity  of  the 
message  has  increased^  but,  with  standardized  task-type- 
names,  the  necessary  information  should  be  minimal. 

There  are  three  issues  to  be  considered  at  this  point. 

(1)  When  should  status  reports  be  issued? 

(2)  How  will  the  subcontractors  be  contacted  and  * 

J 
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recovered? 

( w*‘»n  ->  -.1v.«^T,fr^ctor  rf.-Pr+s  be¬ 
tween  the  last  status  report  and  the  nodal 
failure? 

Peports  should  be  issued  upon  request,  but  this  does 
not,  in  itself,  fulfill  the  needs  of  the  manager.  If  a 
node  fails  between  the  award  and  the  time  the  manager 
gets  suspicious,  then  nothing  has  been  gained.  The  task 
will  still  have  to  be  reannounced  by  the  manager.  To 
correct  this,  an  interim  status  report  will  be  auto¬ 
matically  required  when  a  contractor  has  established  his 
own  subcontracts.  This  interim  status  report  will  contain 
the  same  information  that  is  contained  in  the  requested 
status  renorts. 

When  a  failed  node  is  detected  by  a  manager,  it  goes 
to  the  most  current  status  renort,  if  one  exists,  from 
that  node.  He  issues  a  task  announcement  similar  to  the 
one  originally  issued  to  the  failed  node.  The  only  differ¬ 
ence  will  be  a  new  contract  number.  The  nodes  will  bid  on 
it,  as  before,  and  the  manager  will  select  the  one  it  feels 
is  moat  appropriate.  When  the  award  is  given,  the  inform¬ 
ation  contained  in  the  status  report  will  be  included  in 
the  task  specification.  The  new  contractor  will  use  this 
information  to  establish  contact  with  the  subcontractors. 

If  a  pertinent  contract  is  data,  then  no  new  contract  is 
needed.  If  a  pertinent  contract  is  an  outstanding  contract, 
then  the  new  manager  issues  a  task  announcement  directed 
to  the  subcontractor,  requesting  that  the  old  contract  be 
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recontracted  with  it. 

To:  SUBCONTRACTOR 
Prom:  NEW  CONTRACTOR 
Type:  TASK  ANNOUNCEMENT 
Contract:  NEW  CONTRACT  NUMBER 

Task  Abstraction: 

TASK  TYPE:  RECONTRACT 

Bid  Specification: 

STATUS 

Expiration  Time: 

IMMEDIATE  RESPONSE 


Pig ore  18.  Recontract  Announcement 

The  subcontractor  replies  with  a  bid  that  contains  a 
current  status  report  in  the  node  abstraction. 

The  award  message  is  simply  an  acknowledgement  that 
it  had  been  received. 

Aproblem  may  arise  if  a  subcontractor  reports  between 
the  last  status  report  and  the  failure  of  its  manager.  Once 
the  node  has  reported,  it  forgets  the  contract.  When  the 


rv.< 
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new  contractor  contacts  the  subcontracting  node,  it  has 


this  from  happening,  it  will  be  required  that  all  nodes 
keep  a  record  of  its  contracts  and  results,  for  a  period 
of  time.  This  should  not  be  too  much  of  a  burden  on  the 
node,  since  the  amount  of  time  it  would  be  required  to 
maintain  the  record  would  be  relatively  short,  and  memory 
is  relatively  inexpencive.  "he  status  report  it  gives 
to  the  contractor  would  be  the  results. 


Figure  19.  Recovery  from  Failed  Node 


The  complete  process  is  illustrated  in  Figure  19. 
contracts  a  task  with  Ng,  who  in  turn,  contracts  with 
and  N^.  Once  Ng  has  established  its  subcontracts,  it 
sends  an  interim  status  report  to  N.j>  by  means  of  an  /. n- 
terim  Report.  When  discovers  Ng  has  failed,  it  re¬ 
issues  the  task  and  contracts  with  N^.  N^,  finding 

nonempty  utilization  and  pertinent  contracts  clauses  in 


the  task  specification,  recontracts  with  N,  and  NJ#  It 

5  4 

issues  n  ir.  vJ.  '  '  .  ■r:/ior.  i  :■  v  r-  :  ••  , 

and  the  nodes  continue  to  function  in  a  normal  manner.  The 
new  contractor,  instead  of  redoing  the  task  itself,  simply 
waits  for  the  reports  from  .nd  and  uses  the  indi¬ 
cated  utilization  of  them,  in  its  report. 

The  cost  of  this  approach  would  be: 

Kn(TA+B+A+Rp+ISR)  for  normal  operation,  plus  .  1Kn 
(TA+ISR+A)  for  status  reports,  plus  S(K+l)(TA+B+A+Rp+ISR) 
for  recovery  from  failure  olus  C(x). 

ISR  is  the  cost  associated  with  the  interim  status 
reports. 

The  cost  of  recovery  is  no  longer  dependent  upon 
the  nodes’  location  in  the  tree. 

There  is  one  more  special  case  which  needs  clarifi¬ 
cation.  It  arises  when  the  interim  status  reoort  for  a 
node  is  lost  with  the  failing  node.  This  can  hanpen  either 
when  the  manager  and  the  contractor  fail  at  the  same  time, 
or  when  a  node  elects  to  work  on  a  task  it  announced  it¬ 
self,  and  then  fails. 

The  new  contractor  will  be  unable  to  get  the  results 
it  expects  to  use  in  the  utilization  clause.  There  are 
two  choices  for  recovery.  The  new  contractor  can  term¬ 
inate  all  subcontracts  in  the  interim  status  report  it 
received,  and  do  the  task  using  its  own  methods,  or  it  can 

reconstruct  the  subcontract  and  reannounce  it. 

The  nresent  interim  status  reports  only  contains  the 

subcontract  number  and  the  node  which  made  it.  Allowing 
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different  nodes  to  have  different  capabilities  makes  it 

wi.il i-. .  w...  w^www..  c:  w.c.n  be  rtconsiruo  . .  d  Tro:.: 

the  utilization  clause. 

When  the  new  manager  doesn't  receive  a  bid  from  the 
directed  task  announcement  to  recontract,  it  terminates 
the  subcontracts  in  the  interim  status  reports,  and  attempts 
to  complete  the  task,  using  the  operator  that  was  the  basis 
for  it  making  a  bid  on  the  task, in  the  first  place. 

The  last  method  is  an  extension  of  the  previous  one. 

The  idea  of  oeriodic  interim  reports  is  expanded  to  con¬ 
tinuous  reporting.  Communication  links  are  established 
between  the  manager  and  the  contractor,  ^he  intention  is 
that  all  managers  will  know,  all  the  time,  exactly  what  is 
the  status  of  their  respective  contractors.  This  would 
be  fulfilled  by  extending  each  nodes  communication  capa¬ 
bility  (possibly  by  adding  another  processor  to  each  node, 
whose  sole  ouroous  is  to  control  and  coordinate  communica¬ 
tion)  . 

The  contract  between  two  nodes  would  be  established 
in  the  normal  manner.  The  manager  would  issue  a  task 
announcement,  the  nodes  would  bid  on  the  task  and  the 
manager  would  award  the  contract  to  the  node  it  prefers. 

Once  the  contract  had  been  made,  a  continuous  commun¬ 
ication  link  would  be  established, between  the  manager  and 
the  contractor.  This  would  enable  the  manager  to  know  at 
all  times  the  health  of  the  node,  the  stage  of  task  com¬ 
pletion,  and  all  pertinent  information,  such  as  partial 
results  and  subcontractors.  When  a  failure  occurs  the 
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manager  knows  about  it  immediately.  It  reannaunces  the 
>•  •-'-J-'C  Wit*i  - .  •  - ..  -i»-  v . — L,v.  ..COUtl'CaC  -u 

and  utilization,  as  was  discussed  in  the  previous  approach. 
The  new  contractor  contacts  the  subcontractors  by  means  of 
a  task  announcement,  and  has  the  contract  remade  with  it¬ 
self  as  the  recipient  of  further  communication.  The 
communication  links  between  the  old  contractor  and  the 
manager  and  subcontractors  are  broken  and  reestablished, 
in  turn,  with  the  new  contractor.  The  cost  of  this  system 
is  Kn(TA+B+A+Rp)+C(COM)+C(x)  for  normal  operation,  plus 
S(K+1 ) (TA+E+A)  for  recovery  from  failure. 
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MBITHOD  I  {  NORMAL  OPERATIONAL  RECOVERY 

'  i  ty  r*  ^  f"  C* 


R^-order 

redundancy 

I 

RnXn( n A+B+A+Rp) +Rn 

C(x) 

0 

Periodic  status 

requests  coupled 

with  reannouncing 

the  task. 

II 

1  .lKn(TA+B+A+R.^)  + 

C  (x) 

S(Kn“L(TA 

+b+a+rd)  + 

K-Rfx)) 

Periodic  status 

requests  with 

interim  status 
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III 

1 .lKn(TA+B+A+Rp+ISR) 
-.1Kn(P+RT))+C(x) 

S(K+1 ) (ta+ 

b+a+isr) 

Continuous 

communication 

between  manager 

and  contractor 

IV 

Kn(r'A+B+A+Rp)+C(x)  + 
C(COM) 

s(k+i)(ta+ 

B+A) 

R=  order  of  redundancy ;C(COM)=  cost  of  communication  links 
K=  number  of  decer  'en+s  ner  node;C(x)=  cost  of  computation 
n=  depth  of  goal  tree;L=tree  deoth  at  which  failure  occurs 
TA=  cost  of  task  announcement  message;S=nuraber  of  failures 
B»  cost,  of  bid  message ;ISR»  cost  of  interim  status  report 
A-  cost  of  award  message;Rp=  cost  of  report  message 


Figure  20.  Cost  of  Recovery  Methods 


Normal  operation  costa  are  continuous  but  recovery 
costa  are  x*ai:UG.a  uuu  .a*,  ...at,  taereiore  it  nances 

sense  to  break  the  costs  of  the  four  approaches,  into  these 
areas. 

The  ordering  of  the  four  methods,  with  respect  to 
normal  operational  costs,  from  least  expensive  to  most  ex¬ 
pensive,  is  II, III, I, IV.  The  cost  ISR  is  on  the  order  of  a 
bid,  therefore,  for  R  greater  than  1,  I  is  more  expensive 
than  either  I  or  II.  IV,  because  of  its  communication 
links,  is  considered  the  most  expensive  of  them  all. 

Ordering,  with  respect  to  recovery  cost,  yields  I, IV, 
III, II. 

Method  I,  the  Rx‘  order  redundancy,  is  not  consider¬ 
ed  to  be  a  good  solution  to  the  problem  of  node  failure. 

It  Is,  at  its  best,  only  a  oartial  fix.  In  the  event  of 
a  catastronhe,  redundancy  will  only  allow  survival  if  one 
node  for  each  task  is  undamaged.  Should  all  R  nodes  fail 
for  a  narticular  task,  the  system  will  fail,  since  it  has 
no  means  of  recovery.  It  does  provide  a  capability  for 
discovering  eroneous  reports,  but  this  is  a  different,  al¬ 
though  related,  problem  from  the  one  discussed  in  this 
paper . 

Method  IV,  continuous  communication  between  nodes, 
is  quite  exoensive,  due  to  extreme  expansion  of  communi¬ 
cation  requirements.  It  has  some  value  where  the  tasks 
assigned  to  the  nodes  are  of  such  high  priority  that  they 
outweigh  the  cost  of  the  communication  links.  This  method 
may  require  the  nodes  to  be  located  near  each  other  and 
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It 


limits  the  number  of  nodes  that  the  net  can  absorb. 

Most  applications  »«uiu  can  ior  either  .-.eUioa  li, 
or  Method  III.  Method  II,  simple  periodic  status,  re¬ 
quests  can  be  more  easily  implemented  than  III,  and  has 
the  lowest  normal  operational  overhead.  Unfortunately, 
it  has  the  highest  recovery  costs,  ^hese  costs  are  not 
independent  of  the  location  in  the  goal  tree  of  the  node 
that  fails.  It  also  loses  the  benefit  of  any  subcon¬ 
tractors  below  the  failed  node,  and  forces  the  work  to 
be  redone.  It  is,  therefore,  best  suited  for  applica¬ 
tions  where  there  are  a  sufficient  number  of  nodes  avail¬ 
able  so  that  the  wasted  effort  is  not  significant,  and 
where  the  tasks  are  of  a  low  enough  priority  that  the  time 
lost  for  recoraoutation  is  no+  important. 

Method  III,  periodic  status  requests  with  interim 
status  reports,  is  suited  for  most  applications  between 
these  poles.  The  cost  of  an  interim  status  report  is 
roughly  the  same  as  any  other  message,  thus  the  normal 
operational  costs  of  Method  III  are  within  a  constant  factor 
of  Method  II.  Its  recovery  time  is  fast  and  independent 
of  the  location  of  the  failure  and  it  allows  the  retrieval 
of  subcontractors  below  the  failed  node.  Considering  its 
cost  verses  effectiveness.  Method  III  is  the  best,  for 
general  use,  of  the  four  methods  discussed. 


•a 


-52- 


7.  EXAMPLE  OP  RECOVERY 

The  example,  in  section  4,  will  be  extended  to  allow 
recovery.  The  method  to  be  used  is  periodic  status  re¬ 
quests  and  interim  status  reports  (Method  III). 

The  awarding  of  contract  1-3  by  Node  1  to  Node  2  is 
accomplished  exactly  as  before.  The  task  announcement, 
bid,  and  award  are  unchanged.  Node  2  awards  contracts 
2-1,  3-3,  and  others  to  Node  3,  Node  4,  and  others  with 
the  messages  already  given. 

Once  these  contracts  are  secured.  Node  2  is  ready  to 
give  an  interim  status  report. 

To:  NODE  1 
Prom :  NODE  2 
Type:  INTERIM  REPORT 
Contract:  1-3 

Result  Description: 

STATUS:  PERTINENT  CONTRACTS: 

A:  (NODE  3,  2-1) 

B:  push  (Box  2,  c,  b) 

C:  (NODE  4,  3-3) 

D:  other  node  working  on  push  (Box  1,  m,  d) 

UTILISATION: 

(A,  B,  C)  V  (A,  B,  D)  V  (A,  B,  E) 
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I f  Node  2  had  failed  before  the  interim  status  report. 

Node  1  would  have  uce jounced  the  task,  The  status 

report  gives  Node  1  the  information  it  needs  to  recover 
lower  nodes. 

Node  2  fails.  Node  1  waits  the  appropriate  period  of 
time  then  it  issues  a  status  request. 

To:  NODE  2 

Prom:  NODE  1 

Type :  TASK  ANNOUNCEMENT 

Contract:  1-3 

Task  Abstraction: 

TASK  TYPE:  STATUS  REQUEST 

Bid  Specification: 

STATUS 

Expiration  Time: 

IMMEDIATE  RESPONSE 

Now  Node  1  issues  the  same  task  announcement  it  made,  for 
the  contract  with  Node  2,  except  the  contract  number  is 
different.  Node  6  makes  a  bid  based  on  its  relevant  in¬ 
stantiation  push  (Box  2,  m,  b).  Node  1  awards  the  contract 
to  it,  including  in  the  award  message  the  information  in 
the  interim  status  report  from  Node  2. 


To:  NODE  6 


Type:  AWARD 
Contract:  6-1 

Task  Specification: 

push  (Box  2,  m,  b) 

((ATR  (a)  AT  (Box  1,  b) 

AT  (Box  2,  c)  AT  (Box  3,  d), 

((AT  (Box  1,  x) 

AT  (Box  2,  x) 

AT  (Box  3,  x))) 

PERTINENT  CONTRACTS: 

A:  (NODE  3,  2-1) 

B:  push  (Box  2,  cf  b) 

C:  (NODE  4,  3-3) 

D:  other  node  working  on  push  (Box  1,  m,  d) 

E:  other  node  working  on  push  (Box  2,  m,  d) 
UTILIZATION: 

(A,  B,  C)  V  (A,  B,  D)  V  (A,  B,  E) 

Node  6  must  establish  contracts  with  those  nodes  in 
the  task  abstraction.  For  Example,  the  message  to  Node  3. 

To:  NODE  3 
From :  NODE  6 
Type:  TASK  ANNOUNCEMENT 
Contract:  7 
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Task  Abstraction: 

1  ■  Wii  ^  I  1  ^  •  -V.A-IV-/0  .  » 

Bid  Specification: 

STATUS 


Expiration  Time: 

IMMEDIATE  RESPONSE 

Node  3»  Node  4,  and  the  others  reply  with  bids  con¬ 
taining  their  current  status  in  their  node  abstractions. 

After  Node  6  acknowledges  these  recontr actings,  sends 
a  new  interim  status  report  to  Node  1,  the  net  continues 
its  normal  operation. 
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I  8.  SUMMARY 


Interest  In  distributed  processing  has  been  growing 
due  to  the  availability  of  increasingly  powerful  micro¬ 
processors.  The  contract  net  has  been  proposed  as  a 
method  for  achieving  the  control  and  coordination  required 
for  it. 

The  issue  of  catastrophic  node  failure  must  be  resolved 
for  the  contract  net  to  be  practical. 

Pour  methods  for  recovery  from  a  failure  have  been 
discussed. 

(1)  order  redundancy.  The  manager  contracts 
with  R  nodes  to  complete  the  task. 

(2)  Simple  periodic  status  requests.  The  manager 
checks  a  contractor  after  a  period  of  time.  If 
it  has  failed,  the  task  is  reannounced. 

(3)  Period  status  requests  with  interim  status  re¬ 
ports.  The  manager  gets  information  from  status 
reports  which  allows  it  to  recover  nodes  below 
the  failure. 

(4)  Continuous  communication  links.  The  manager  re¬ 
ceives  continuous  updates  on  the  condition  and 
progress  of  the  contractor. 

The  first  doesn't  solve  the  problem.  The  fourth  is  too 
expensive,  except  for  small  nets  with  high  priority  tasks. 
The  second  is  inexpensive  and  easily  implemented,  but  it 
cannot  recover  nodes  below  the  failure. 


L 


The  third,  periodic  status  requests  with  interim  status 


reports,  because  it  is  inexpensive  and  effective,  is 
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§ .  HISTORY  Of  PROBLEM 

There  have  been  many  attempts  at  planning  with  vary¬ 
ing  levels  of  success.  GPS5,  STRIPS,  ABSTRIPS6  and  NOAH7 
were  scudied  for  suitability  as  a  medium  for  examining 
the  issue  of  node  failure.  Although  NOAH  had  been  dis- 

Q 

tributed  in  other  work  ,  STRIPS  was  chosen  for  this  paper. 
Distributing  it  is  sufficiently  involved  to  bring  out  the 
issues,  yet  simple  enough  to  generate  understandable  examples. 
The  author  of  this  paper  knows  no  other  literature 


on  the  problem  of  node  failure 


10.  SUGGESTIONS  FOR  FUTURE  WORE 


Distributing  STRIPS  has  brought  out  some  issues  which 
were  not  dealt  within  this  paper. 

Tne  suitability  of  STRIPS  for  distribution  is  still 
quetionable.  Implementing  the  distribution  of  alternative 
choices  towards  reducing  a  difference,  is  reasonable  and 
clean,  but  the  distribution,  generated  by  breaking  a  task 
into  subtasks,  before  and  after  operator  application,  was 
done  only  by  deemphasizing  the  importance  of  the  delete 
list.  The  trick  employed  may  not  be  valid  for  all  appli¬ 
cations  of  STRIPS. 

Prior  to  distribution  of  alternate  choices  is  the 
problem  of  selection.  This  paper  simply  selected  then  all, 
ignoring  both  finite  number  of  nodes  in  the  net,  and  possi¬ 
ble  wide  diparity  between  alternatives.  Examining  all 
alternatives  could  lead  to  super-saturating  the  net.  It 
may  be  worthwhile  to  use  only  a  fraction  of  the  alterna¬ 
tives,  the  fraction, determined  by  the  complexity  of  the 
better  alternatives,  or  at  least  holding  off  investigating 
an  alternative  that  is  thought  to  be  very  unlikely.  Nodes, 
with  more  than  one  operator  instantiation,  pose  a  problem 
for  the  manager,  that  might  be  better  handled  by  the  con¬ 
tractor. 

The  solution  to  the  problem  of  failed  nodes, asserted 


by  this  paper,  requires  that  work  be  done  in  developing 
heuristics  for  determining  tolerable  waiting  periods.  Wait¬ 
ing  too  long  results  in  excessive  lost  time,  and  waiting 


too  short  results  in  excessive  message  exchanges.  Although, 

I-  L  ’wlw  r ..  ,  vh  •;  t  would  he  prefer  .  . 

istics  that  are  independent  of  the  class  of  tasks  given 
to  the  nodes. 

Inere  are  problems  related  to  node  failure  which  have 
not  been  treated  at  all.  Lost  or  garbled  messages  might 
be  handled  by  the  inclusion  of  a  repeat  message.  After 
reception  of  a  garbled  message,  or  an  excessive  wait  for 
a  message,  a  node  could  ask  for  the  message  to  be  repeated. 
For  the  class  of  problems,  where  checking  a  solution  is 
easier  than  finding  the  solution,  erroneous  data  from  a 
node  could  be  detected  by  requiring  the  manager  to  check 
the  solution  given  it,  against  the  task  it  contracted  out. 
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11.  lOOTBOTES 

1.  See  reference  2. 

2.  See  reference  5. 

3.  See  pgs.  200-203  in  reference  2. 

4.  This  method  was  suggested  in  reference  5. 

5.  See  reference  6. 

6.  See  reference  3. 

7.  See  reference  5. 

8.  See  reference  1. 
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